Amplify

Context

Context

Summary Statistics - Plain

Unit No. Responses Pre-Unit Mean Post-Unit Mean Percent Growth
Dinosaur Domestication 813 53 87 64
Earth’s Giant Turtle 977 30 69 129
Life on Mars 973 31 68 120
Potions 972 28 70 152
Unicorn Traits and Reproduction 940 34 81 140
Unnatural Selection 931 41 83 101

Summary Statistics - kableExtra

Unit No. Responses Pre-Unit Mean Post-Unit Mean Percent Growth
Dinosaur Domestication 813 53 87 64
Earth’s Giant Turtle 977 30 69 130
Life on Mars 973 31 68 119
Potions 972 28 70 150
Unicorn Traits and Reproduction 940 34 81 138
Unnatural Selection 931 41 83 102

Summary Statistics Code

sim_data_pre_post %>% 
  mutate(post_mean = kableExtra::cell_spec(post_mean, "html", 
                                           bold = T, font_size = 20,
                                 color = kableExtra::spec_color(post_mean, 
                                                                begin = 0, 
                                                                end = 0.6, 
                                                                direction = 1,
                                                                option = "B")), 
        perc_growth = kableExtra::cell_spec(perc_growth, "html", 
                                            bold = T, font_size = 20,
                                 color = kableExtra::spec_color(perc_growth, 
                                                                begin = 0, 
                                                                end = 0.6, 
                                                                direction = 1,
                                                                option = "B"))) %>% 
  knitr::kable(format = "markdown", align = "r", 
               col.names = c("Unit", "No. Responses", "Pre-Unit Mean", 
                             "Post-Unit Mean", "Percent Growth")) %>%
  kableExtra::kable_styling(position = "left")

Boxplot

Ridgeplot

Ridgeplot Code

sim_data_post %>% 
  ggplot(aes(score, 
             reorder(unit_title, -score))) +      # order by score
  ggridges::geom_density_ridges(
                      scale = 1,                  # ridge height/overlap 
                      rel_min_height = 0.001,     # where to draw line 
  scale_x_continuous(limits = c(0, 100),          # limits - set cutoff 
                     expand = c(0.01, 0)) + 
  scale_y_discrete(expand = c(0.2, 0)) +          # padding around graph 
  labs(x = "Score", y = "") + 
  theme_minimal()

Stacked Bars

Stacked Bars Code

sim_data_pre_post %>% 
  group_by(unit_title, assessment, score_level) %>% 
  summarise(count = n()) %>% 
  ggplot(aes(assessment, count), order=-as.numeric(score_level)) +
  geom_bar(stat = "identity", position = "fill", aes(fill = score_level)) + 
  facet_wrap(~unit_title) +
  scale_fill_manual("Score Level", 
                    values = c("1" = "#4d5050", "2" = "#c2c5c6", 
                               "3" = "#f2ac80", "4" = "#F37321")) +
  labs(x = "", y = "") +
  theme_bw() + 
  theme(panel.border = element_blank(), 
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(), 
        axis.line = element_line(colour = "black"), 
        strip.text.x = element_text(size = 7))

Alluvial Plot

Alluvial Plot Code

sim_data_pre_post %>% 
  ggplot(aes(x = assessment,               # categorical x var (pre or post)
           stratum = score_level,          # categorical var (score level)
           alluvium = user_unit,           # individual/unit 
           fill = score_level,             # color of fill
           label = score_level)) +         # color of legend
  ggalluvial::geom_flow(stat = "alluvium", 
              aes.flow = "forward",        # direction of flow
              alpha = 1) +                 # density of fill 
  ggalluvial::geom_stratum() +
  labs(x = "") +
  facet_wrap(~unit_title, 
             scales = "free_y") +          # let y scale vary by facet
  scale_fill_manual("Score Level",
                     values = c("1" = "#4d5050", "2" = "#c2c5c6", 
                               "3" = "#f2ac80", "4" = "#F37321"))  # ...

Regression Output

## 
## Call:
## lm(formula = post ~ pre + time + dino + turtle + life_on_mars + 
##     potions + unnatural_selection, data = sim_data_pre_post1)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -68.47  -5.77   1.56   7.99  32.81 
## 
## Coefficients:
##                     Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)          27.4345     8.6573    3.17              0.00154 ** 
## pre                   0.4638     0.0151   30.70 < 0.0000000000000002 ***
## time                  0.4022     0.1714    2.35              0.01899 *  
## dino                 -4.6815     0.7020   -6.67       0.000000000028 ***
## turtle               -7.9888     0.6098  -13.10 < 0.0000000000000002 ***
## life_on_mars         -9.1270     0.6092  -14.98 < 0.0000000000000002 ***
## potions              -5.9409     0.6169   -9.63 < 0.0000000000000002 ***
## unnatural_selection  -2.2635     0.6107   -3.71              0.00021 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13 on 5598 degrees of freedom
## Multiple R-squared:  0.338,  Adjusted R-squared:  0.337 
## F-statistic:  408 on 7 and 5598 DF,  p-value: <0.0000000000000002

Regression Visualization

Regression Visualization Code

reg1 <- broom::tidy(reg1) %>%             # tidy regression output
  mutate(term = c("(Intercept)",          # make output variables readable
                  "Pre-Unit Score", 
                  "Time in Costume", 
                  "Dinosaur Domestication", 
                  "Earth's Giant Turtle", 
                  "Life on Mars", 
                  "Potions", 
                  "Unnatural Selection"))

dotwhisker::dwplot(reg1) +                 # make dot and whisker plot 
  theme_minimal() +
  theme(legend.position = "none") +        # remove legend
  geom_vline(xintercept = 0,               # set reference line at zero 
             colour = "grey60",            # make reference line grey
             linetype = 2) +               # make reference line dashed
  scale_x_continuous(limits = c(-15, 5))   # set limits for the x axis 

Takeaways

  • Visualize when possible!
  • Think about your design
  • Make use of color
  • Think about the general takeaway
  • Use the same language as your audience
  • Use Rmarkdown

References

Thank Yous

  • Soumya Kalra for telling me I was giving a talk 😱
  • My awesome data science team (Samuel Crane, Tashi Lama, and Harry Gamble) for listening to me give this talk and giving great feedback

  • My awesome data scientist husband (Sebastian Teran Hidalgo) for also listening to me give this talk and giving great feedback
  • My coworker Jesse Vogel for helping me change the css for the isoslides table output